rMPI: An MPI-Compliant Message Passing Library for Tiled Architectures
نویسندگان
چکیده
Next-generation microprocessors will increasingly rely on parallelism, as opposed to frequency scaling, for improvements in performance. Microprocessor designers are attaining such parallelism by placing multiple processing cores on a single piece of silicon. As the architecture of modern computer systems evolves from single monolithic cores to multiple cores, its programming models continue to evolve. Programming parallel computer systems has historically been quite challenging because the programmer must orchestrate both computation and communication. A number of different models have evolved to help the programmer with this arduous task, from standardized shared memory and message passing application programming interfaces, to automatically parallelizing compilers that attempt to achieve performance and correctness similar to that of hand-coded programs. One of the most widely used standard programming interfaces is the Message Passing Interface (MPI). This thesis contributes rMPI, a robust, deadlock-free, high performance design and implementation of MPI for the Raw tiled architecture. rMPI's design constitutes the marriage of the MPI inter-. face and the Raw system, allowing programmers to employ a well understood programming model to a novel high performance parallel computer. rMPI introduces robust, deadlock-free, and highperformance mechanisms to program Raw; offers an interface to Raw that is compatible with current MPI software; gives programmers already familiar with MPI an easy interface with which to program Raw; and gives programmers fine-grain control over their programs when trusting automatic parallelization tools is not desirable. Experimental evaluations show that the resulting library has relatively low overhead, scales well with increasing message sizes for a number of collective algorithms, and enables respectable speedups for real applications. Thesis Supervisor: Anant Agarwal Title: Professor of Electrical Engineering and Computer Science
منابع مشابه
rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
With multicore processors becoming the standard architecture, programmers are faced with the challenge of developing applications that capitalize on multicore’s advantages. This paper presents rMPI, which leverages the onchip networks of multicore processors to build a powerful abstraction with which many programmers are familiar: the MPI programming interface. To our knowledge, rMPI is the fir...
متن کاملMessage Passing On Communication-Exposed Multi-Core Processors
Next-generationmicroprocessorswill increasingly rely onparallelism, as opposed to frequency scaling, for improvements in performance scalability. Microprocessor designers are attaining such parallelism by placing multiple processing cores on a single silicon die. Current commercial multi-core processors such as the POWER and AMD Opteron force inter-processor communication to go through the...
متن کاملMPI Benchmarking Revisited: Experimental Design and Reproducibility
The Message Passing Interface (MPI) is the prevalent programming model used on today’s supercomputers. Therefore, MPI library developers are looking for the best possible performance (shortest run-time) of individual MPI functions across many different supercomputer architectures. Several MPI benchmark suites have been developed to assess the performance of MPI implementations. Unfortunately, t...
متن کاملThe Interoperable Message Passing Interface (IMPI) Extensions to LAM/MPI
Interoperable MPI (IMPI) is a protocol specification to allow multiple MPI implementations to cooperate on a single MPI job. Unlike portable MPI implementations, an IMPI-connected parallel job allows the use of vendor-tuned message passing libraries on given target architectures, thus potentially allowing higher levels of performance than previously possible. Additionally, the IMPI protocol use...
متن کاملMPI on BlueGene/L: Designing an Efficient General Purpose Messaging Solution for a Large Cellular System
The Blue Gene/L supercomputer uses system-on-a-chip integration and a highly scalable 65,536 node cellular architecture to deliver 360 Teraflops of peak computing power. Efficient operation of the machine requires a fast, scalable and standards compliant MPI library. Researchers at IBM and Argonne National Labs are porting the MPICH2 library to Blue Gene/L . We present the current state of the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005